-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: revise documentation structure; add internals page. #2184
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #2184 +/- ##
=======================================
Coverage 86.27% 86.27%
=======================================
Files 130 130
Lines 14750 14750
Branches 2623 2623
=======================================
Hits 12725 12725
Misses 1724 1724
Partials 301 301
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
doc/sourmash-internals.md
Outdated
|
||
## Signatures and sketches | ||
|
||
sourmash operates on sketches. Each sketch is a collection of hashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sourmash operates on sketches. Each sketch is a collection of hashes. | |
sourmash operates on sketches. Each sketch is a collection of hashes. Each hash is a whole number product of a k-mer and hash function. |
Maybe link the hash function to the ## making sketches section of the document. May wish to include another line about k-mers as well. Such as,
"A k-mer is a short sub-sequence generated from a larger sequence file via a sliding window approach.
I.e. If k=3, a 3-mer of an 8 base pair DNA molecule will have 6 individual 3-mers digested into hashes to populate a sketch that are stored in a signature.
The DNA sequence:
AGTCATCG
The k-mers of the sliding window process:
[AGT].....
.[GTC]....
..[TCA]...
...[CAT]..
....[ATC].
.....[TCG]
Hashes digested from each k-mer:
2118360698
1681319365
65865673
1255238040
364627471
659934874
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could point at https://sourmash.readthedocs.io/en/latest/kmers-and-minhash.html for a basic intro, rather than rehashing (hah!) k-mers. What do you think?
Here are my revisions to the intro para - thoughts?
sourmash operates on sketches. Each sketch is a collection of hashes,
which are in turn built from k-mers by appling a hash function
(currently always murmurhash) and a filtering function. Each sketch
is contained in a signature wrapper that contains some metadata.
thanks! Co-authored-by: ccbaumler <63077899+ccbaumler@users.noreply.github.com>
## How do k-mer-based analyses compare with read mapping? | ||
|
||
tl;dr very well! But it's a bit one sided: if k-mers match, reads will | ||
map, but not necessarily vice versa. So read mapping rates are almost always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the explicit 'vice versa'? 'if reads map, kmers will match?' What does that mean?
I guess I just don't understand the fundementals of this section, may be worth adding a longer section before the tldr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exactly: if reads map, k-mers may not match. Will think on how and if to improve text!
I ran out of time to go over internals but I am most excited about that document. What do you think about including sections for an explanation of the outputs of the columns of the csv files for Gather csv file column explanation:
PS. I agree with taylor about the navigation... And this is really nice documentation. Thanks for writing it. |
thanks! Co-authored-by: ccbaumler <63077899+ccbaumler@users.noreply.github.com>
Updated! Curious what you think of the new approach @ccbaumler @taylorreiter. |
ok, figured it all out - here's the latest index page + sidebar: |
I like the simpler navigation using the markdown headers with the internal links in the sections. In addition, there are three universal pages I can think of that people would want to get to quickly from the landing page:
Could those three be included in the navigation... Maybe in Detailed usage: section in the sidebar? |
added in 8144c3c @ccbaumler - thx for the nudge! should be up shortly on the RTD pull request build. |
I looked at the page and this is the second paragraph:
I'm not sure what more to add, or how to change it. Maybe a change to the title of the page? |
…o update_doc_structure
OK, I'm going to merge this and then deal with remaining comments later - it's gotten big and unwieldy to update at this point, so I don't want to delay it more! |
Follow-up to #2184 with minor corrections.
This PR rearranges the docs to according to the https://diataxis.fr/ structure, per #2054.
New pages:
Fixes #2054 (document restructuring)
Fixes #932 (add an FAQ)
Fixes #2760 (tax preferred to lca)
Tackles #1227 (what is gather)
Fixes #971 (funding acks)
Fixes #1289 (p_match and p_query)
Fixes #1531 (document memory tradeoffs in save formats)
Fixes #1532 (order of database load/reporting)
Fixes #1609 (better gather description,
f_unique_query
, etc.)Fixes #2170 (use
detection
)Fixes #1881 (correlation with read mapping)
Fixes #2566 (retrieving reads)
Fixes #2775 (vision & mission)